88 research outputs found
Sparse Learning over Infinite Subgraph Features
We present a supervised-learning algorithm from graph data (a set of graphs)
for arbitrary twice-differentiable loss functions and sparse linear models over
all possible subgraph features. To date, it has been shown that under all
possible subgraph features, several types of sparse learning, such as Adaboost,
LPBoost, LARS/LASSO, and sparse PLS regression, can be performed. Particularly
emphasis is placed on simultaneous learning of relevant features from an
infinite set of candidates. We first generalize techniques used in all these
preceding studies to derive an unifying bounding technique for arbitrary
separable functions. We then carefully use this bounding to make block
coordinate gradient descent feasible over infinite subgraph features, resulting
in a fast converging algorithm that can solve a wider class of sparse learning
problems over graph data. We also empirically study the differences from the
existing approaches in convergence property, selected subgraph features, and
search-space sizes. We further discuss several unnoticed issues in sparse
learning over all possible subgraph features.Comment: 42 pages, 24 figures, 4 table
Mining significant substructure pairs for interpreting polypharmacology in drug-target network.
A current key feature in drug-target network is that drugs often bind to multiple targets, known as polypharmacology or drug promiscuity. Recent literature has indicated that relatively small fragments in both drugs and targets are crucial in forming polypharmacology. We hypothesize that principles behind polypharmacology are embedded in paired fragments in molecular graphs and amino acid sequences of drug-target interactions. We developed a fast, scalable algorithm for mining significantly co-occurring subgraph-subsequence pairs from drug-target interactions. A noteworthy feature of our approach is to capture significant paired patterns of subgraph-subsequence, while patterns of either drugs or targets only have been considered in the literature so far. Significant substructure pairs allow the grouping of drug-target interactions into clusters, covering approximately 75% of interactions containing approved drugs. These clusters were highly exclusive to each other, being statistically significant and logically implying that each cluster corresponds to a distinguished type of polypharmacology. These exclusive clusters cannot be easily obtained by using either drug or target information only but are naturally found by highlighting significant substructure pairs in drug-target interactions. These results confirm the effectiveness of our method for interpreting polypharmacology in drug-target network
Machine learning refinement of in situ images acquired by low electron dose LC-TEM
We study a machine learning (ML) technique for refining images acquired
during in situ observation using liquid-cell transmission electron microscopy
(LC-TEM). Our model is constructed using a U-Net architecture and a ResNet
encoder. For training our ML model, we prepared an original image dataset that
contained pairs of images of samples acquired with and without a solution
present. The former images were used as noisy images and the latter images were
used as corresponding ground truth images. The number of pairs of image sets
was and the image sets included images acquired at several different
magnifications and electron doses. The trained model converted a noisy image
into a clear image. The time necessary for the conversion was on the order of
10ms, and we applied the model to in situ observations using the software Gatan
DigitalMicrograph (DM). Even if a nanoparticle was not visible in a view window
in the DM software because of the low electron dose, it was visible in a
successive refined image generated by our ML model.Comment: 33 pages, 9 figure
Machine learning reveals orbital interaction in crystalline materials
We propose a novel representation of crystalline materials named
orbital-field matrix (OFM) based on the distribution of valence shell
electrons. We demonstrate that this new representation can be highly useful in
mining material data. Our experiment shows that the formation energies of
crystalline materials, the atomization energies of molecular materials, and the
local magnetic moments of the constituent atoms in transition metal--rare-earth
metal bimetal alloys can be predicted with high accuracy using the OFM.
Knowledge regarding the role of coordination numbers of transition-metal and
rare-earth metal elements in determining the local magnetic moment of
transition metal sites can be acquired directly from decision tree regression
analyses using the OFM.Comment: 10 page
SiBIC: A Tool for Generating a Network of Biclusters Captured by Maximal Frequent Itemset Mining
Biclustering extracts coexpressed genes under certain experimental conditions, providing more precise insight into the genetic behaviors than one-dimensional clustering. For understanding the biological features of genes in a single bicluster, visualizations such as heatmaps or parallel coordinate plots and tools for enrichment analysis are widely used. However, simultaneously handling many biclusters still remains a challenge. Thus, we developed a web service named SiBIC, which, using maximal frequent itemset mining, exhaustively discovers significant biclusters, which turn into networks of overlapping biclusters, where nodes are gene sets and edges show their overlaps in the detected biclusters. SiBIC provides a graphical user interface for manipulating a gene set network, where users can find target gene sets based on the enriched network. This chapter provides a user guide/instruction of SiBIC with background of having developed this software. SiBIC is available at http://utrecht.kuicr.kyoto-u.ac.jp:8080/sibic/faces/index.jsp
Efficiently finding genome-wide three-way gene interactions from transcript- and genotype-data
Motivation: We address the issue of finding a three-way gene interaction, i.e. two interacting genes in expression under the genotypes of another gene, given a dataset in which expressions and genotypes are measured at once for each individual. This issue can be a general, switching mechanism in expression of two genes, being controlled by categories of another gene, and finding this type of interaction can be a key to elucidating complex biological systems. The most suitable method for this issue is likelihood ratio test using logistic regressions, which we call interaction test, but a serious problem of this test is computational intractability at a genome-wide level
Mining metabolic pathways through gene expression
Motivation: An observed metabolic response is the result of the coordinated activation and interaction between multiple genetic pathways. However, the complex structure of metabolism has meant that a compete understanding of which pathways are required to produce an observed metabolic response is not fully understood. In this article, we propose an approach that can identify the genetic pathways which dictate the response of metabolic network to specific experimental conditions
- …